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Abstract 


In this study, two homology models of the main proteinase (M 


pro 


) from the novel coronavirus associated with severe acute res- 


piratory syndrome (SARS-CoV) were constructed. These models reveal three distinct functional domains, in which an intervening 
loop connecting domains II and III as well as a catalytic cleft containing the substrate binding subsites S1 and S2 between domains I 
and II are observed. S2 exhibits structural variations more significantly than S1 during the 200 ps molecular dynamics simulations 
because it is located at the open mouth of the catalytic cleft and the amino acid residues lining up this subsite are least conserved. In 
addition, the higher structural variation of S2 makes it flexible enough to accommodate a bulky hydrophobic residue from the 


substrate. 
© 2004 Elsevier B.V. All rights reserved. 


1. Introduction 


Coronaviruses belong to a diverse group of positive- 
stranded RNA viruses and share a similar genome 
organization and common transcriptional/translational 
processes as Arteriviridae [1,2]. The human coronavirus 
HcoV-229E replicase gene encodes two overlapping 
polyproteins [3], that mediate all the functions required 
for viral replication and transcription [4]. The functional 
polypeptides are released from the polyproteins by 
extensive proteolytic processing, which is primarily 
achieved by the 33.1-kDa main proteinase (M?**) [5]. 
M?"® from HcoV-229E (MP'°H) has been biosynthesized 
in Escherichia coli and its enzyme properties have been 
well characterized [5,6]. 

Several studies have revealed significant differences 
in both the active sites and domain structures of MP*® 
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from coronavirus and picornavirus [6-8]. Previous 
experimental data have shown that the differential 
cleavage kinetics of all coronaviruses is a conserved 
feature of MP*° [9]. Furthermore, the cleavage pattern 
appears to be conserved in MP'® from SARS-CoV 
(MP*°S) and from other coronaviruses [10], as deduced 
from the genome sequence [11,12]. The functional 
importance of M?"® in the viral life cycle has made it 
an attractive target for the development of drugs direc- 
ted against SARS and other coronavirus infections. 
Thus, screening the known proteinase inhibitor librar- 
ies may be an appreciated shortcut to discover anti- 
SARS drugs [13]. Crystal structures of M?°H [10] 
and MP"® from porcine coronavirus (transmissible gas- 
troenteritis virus, TGEV) (MP'°T) complexed with its 
inhibitor [14] have been determined. Comparison of 
these structures reveals a remarkable degree of struc- 
tural conservation. 

Previously, several molecular dynamics (MD) simula- 
tions, homology modeling, and molecular docking 
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experiments have been conducted in our group 
[15-18]. In this Letter, two homology models of MP'°S 
(denoted as M?*°SH and M'°ST) were constructed 
based on the crystal structures of MP'°H [10] and MPT 
[14], respectively. In addition, MD simulations were per- 
formed to investigate the dynamics behaviors of these 
structures. 


2. Methods 
2.1. Template proteins 


The atomic coordinates of MP*°T and MP'°H were 
obtained from the protein data bank (PDB; Ilvo and 
Ip9u, respectively). Unfavorable non-physical contacts 
in these structures were eliminated using Biopolymer 
module of Insight II (Accelyrs, San Diego, CA, USA) 
with the CVFF forcefield [19] in the SGI O2* worksta- 
tion with 64-bit MIPS RISC R12000 270 MHz CPU 
and PMC-Sierra RM7000A 350 MHz processor (Silicon 
Graphics, Inc., Mountain View, CA, USA), followed by 
10000 energy minimization calculations using steepest 
descent method. 


2.2. Structural homology 


The procedures of amino acid sequence alignment 
and homology modeling were described previously 
[18]. The newly built homology models were substan- 
tially refined to avoid van der Waals radius overlapping, 
unfavorable atomic distances, and undesirable torsion 
angles using molecular mechanics and dynamics features 
in Discover module. 


2.3. Molecular dynamics simulations 


The present MD simulations were performed in the 
CVFF forcefield [19]. The crystal structures of MP'°H 
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and M?P'°T and the homology models of M?'°SH and 
M?P"*ST were subjected to energy minimization calcula- 
tions. Each energy-minimized structure was placed in 
the center of a lattice with the size of 50 x 60 x 85 A® 
full of 6222, 5866, 5836, and 5776 water molecules 
for the system of M?'°H, MPT, M?'°SH, and 
MP'°ST, respectively. In order to arrange the soaked 
water molecules randomly, water molecules alone were 
submitted to 10000 iterations by conjugate gradient 
minimization, keeping the protein atoms fixed. The 
system composed of the minimized structures of pro- 
tein and water molecules was then used as the starting 
image. Finally, 200 ps MD simulation with 5 ps in 
equilibrium step was carried out for each system using 
the Discover module of Insight II. The explicit image 
periodic boundary condition (PBC) was used for sol- 
vent equilibrium. The temperature and pressure were 
maintained for each MD simulation at 300K and 
one atmosphere, respectively, as described by Berend- 
sen et al. [20]. Cut-off radius of 10 A for the non- 
bonded interactions was applied. The time-step of the 
MD simulations was 1 fs. The trajectories and coordi- 
nates of these structures were recorded every 2 ps for 
further analysis. 


3. Results and discussion 
3.1. Amino acid sequence alignment 


The results of amino acid sequence alignment of 
MP"’S to MP*°T and MP" H are given in Fig. 1. The res- 
idue corresponding to Ala46 in domain I of M?*°S and 
those corresponding to Asp248, [le249, and Gln273 in 
domain II of MP'°S are missing in both MP'°T and 
MP? H. In addition, there are one and two extra resi- 
dues at the C-terminus of MP?'°S comparing to MP*°T 
and MP'°H, respectively. Domain II exhibits higher 
sequence variation among these three domains. Both 
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Fig. 1. Amino acid sequence alignment of MP*°T, MP*°H, and M?°S. Secondary structures defined in the crystal structure of MP*°T are shown on 


top. 
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the general acid—base catalyst (His residue in domain I) 
and the nucleophile (Cys residue in domain II) of these 
three proteins are totally conserved. 

Table | lists the percentages of amino acid identity 
among these proteins. M?*°T and MP'°H show the high- 
est total amino acid identity (60.80%), whereas M?"°H 
and MP"°S exhibit the lowest total amino acid identity 
(40.19%). In addition, domain II has the highest amino 
acid identity, whereas domain III shows the lowest ami- 
no acid identity among these three proteins. The low 
sequence identities between M?*°S and M"'°T and be- 
tween MPS and M?'°H from the present study are 
in good agreement with the previous results [21], 
where SARS-CoV was classified as a new group of cor- 


Table 1 


The amino acid sequence identities among M?*°H, MP*°T, and MP'°S 


Identity (“%) 


Total DomainI DomainII Domain III 
MPH and M?°T = 60.80 ~—s- 63.44 65.06 55.45 
MPH and MP*°S”—s 40.19 Ss 41.94 45.78 35.64 


M°T and MPS 43.85 44.09 49.40 39.22 


onavirus based on the analysis of the deduced genome 
sequence. 


3.2. The homology models of M’’ST and M?’’’SH 


The homology models of M?'°ST and MP'°SH are 
illustrated in Figs. 2a and b, respectively. Both MP?'°ST 
and MP'°SH exhibit three distinct domains and adopt 
similar folds as M?*°T and M?'°H, respectively. These 
models are in the similar order of magnitude comparing 
to the homology models constructed previously [10,13]. 
The quality of the geometry and of the stereochemistry 
of these homology models was further validated using 
Homology/ProStat/Struct_Check commend of Insight 
II. A total of 97% and 96% of the backbone dihedral 
angle (g and @) densities are located within the structur- 
ally favorable regions in Ramachandran plot for 
MP°ST and MP'°SH, respectively. The calculation of 
main chain torsion angles (y; and x2) of these models 
showed no severe distorsion of the backbone geometry. 

The putative substrate binding subsites S1 and S2 of 
MP°ST and M?°SH are located in a cleft between do- 
mains I and II, which are nearly identical to those of 
MPT and MP'°H (Fig. 2). It indicates that MP°S 
may follow the similar substrate binding mechanisms 


Fig. 2. The homology model of (a) M’°ST and (b) MP*°SH visualized by Insight II. «-Helices and B-strands are shown in red cylinders and yellow 
arrows, respectively. The general acid—base catalyst His residue and the nucleophilic Cys residue are labeled. The locations of the putative substrate 


binding subsites S1 and S2 are indicated. 
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Table 2 
The RMSDs between the template proteins, M?’°H and M?'°T, and 
the homology models, MP*°SH and M?P*°ST 


RMSD (A) 

MPreH MPT MPr°SH 
MP°T 2.01 = = 
MP?'°SH 4.5] 3.94 - 


MP°ST 4.84 4.37 5.78 


of MP*°T and M?"°H, allowing us to design anti-SARS 
drugs by simply screening the known proteinase inhibi- 
tors. The low sequence identity and secondary structure 
conservation in domain III among these proteins suggest 
that 1t may play a minor role in proteolytic activity. As 
shown in Table 2, the RMSDs of M’*°SH and M?*°ST 
are 4.84 and 3.94 A, comparing to their corresponding 
templates, MP"°H and M?'*T, respectively; while the 
RMSD between M?"°SH and M?°ST is 5.78 A. It indi- 
cates that the structure of MP'°S is more similar to that 
of MPT. 


3.3. Molecular dynamics simulations 


As shown in Fig. 3, these structures remained consid- 
erably stable during the MD time course, with the root- 
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mean-square deviations (RMSDs) remained within 3 A. 
It is obvious that domain III exhibits higher structural 
variations than the other two domains in all cases. S1 
was found to maintain its structural integrity, whereas 
S2 exhibits higher structural fluctuations during the en- 
tire MD simulations. It is attributed to that S2 is located 
on the open mouth of the catalytic cleft between do- 
mains I and II, whereas S1 is situated in the very bottom 
of this cleft and is well protected by the hydrophobic 
core. The higher structural variation of S2 makes it flex- 
ible enough to accommodate a bulky hydrophobic resi- 
due from the substrate. 

In the crystal structures, the distance between the sul- 
fur atom of Cysl144 and the N® of His4lin M?"°T is 
4.05 A [14], longer than the corresponding Cys—His dis- 
tances in HAV 3CP™ (3.92 A) [22], poliovirus (PV) 3CP™ 
(3.4 A) [23], and papain (3.65 A) [24]. From a dynamics 
point of view (Fig. 4), the Cys144-His41 distance of 
MP" H fluctuated more rapidly than that of MP'°T. In 
addition, the Cys145—His41 distances of MP'°SH fluctu- 
ated more rapidly than that of MP?'°ST beyond 150 ps. 
These results indicate that both MPT and MP °ST 
may exhibit more stable active site configurations than 
those of MP*°S and MP*°SH. The large degree of fluctu- 
ation of these Cys—His distance may indicate that the 
structure of the catalytic site is not stable when it is 
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Fig. 3. The RMSDs of the backbone C, for (a) the whole protein, (b) domain I, (c) domain II, (d) domain II, (e) S1, and (f) S2 of MP*°T, MP'° H, 


MP°ST, and MP'°SH during MD simulations. 
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Fig. 4. The linear distance between the sulfur atom of the nucleophilic 
Cys residue and the N? of the general acid—base catalyst His residue as 
a function of MD simulation time. 


not protected from substrate or ligand binding. This 
result is in very good agreement with the previous find- 
ings that there are significant differences in the flexibility 
in the active site of the SARS-CoV proteinase [25]. Fur- 
thermore, the high flexibility of the active site may allow 
these proteins to execute the catalytic process more 
efficiently. 
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It has been shown previously that, similarly to 3CP" 
[23,24], specific substrate binding by M?"® is ensured by 
the well-defined S1 and S2 binding subsites [14]. In both 
MP°T and M?'°H, S82 is lined by the side chains of 
His41, Thr47, He51, Leul64, and Prol88, despite for 
the residue Leul64 in M?"°T being replaced by Ile. In 
MPS, S2 is lined by the side chains of His41, Asp48, 
Pro52, Met165, and Gln189. It indicates that S2 is not 
as conserved as S] among these proteins. It 1s worthy 
of mentioning that the main chain of Leul64 of MP*°T 
(or Ilel64 of M?*°H or Metl65 of M?P*°S) forms part 
of Sl, while its side chain 1s involved in S2, indicating 
that these two subsites are somewhat influenced by each 
other towards substrate binding. 

The analysis of ASAs of both SI and S2 during the 
MD simulations indicates that both subsites are flexible 
enough to accommodate the substrates. The snapshots 
of both S1 and S2 for these proteins with the smallest 
and largest accessible surface areas (ASAs) sampled 
from the 200 ps MD simulations were illustrated in 
Fig. 5. It is interesting that the sizes and conformations 
of the smallest and the largest S1 pocket of M?*°SH 
are very similar to those of M’'°T. The variation of 


Fig. 5. Molecular surfaces of the substrate binding subsites Sl and S2 for MP*°T, MP*°H, MP*°ST, and MP*°SH with the smallest and the largest 
ASAs during MD simulations. The residues forming these subsites are shown in red. 
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the size and conformation of S2 for these proteins 1s 
more significant than S1 during the MD simulations, 
probably because part of S2 is fully exposed to the 
solvent. 
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